AITopics | double oracle algorithm

Collaborating Authors

double oracle algorithm

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

XDO: A Double Oracle Algorithm for Extensive-Form Games

Neural Information Processing SystemsDec-24-2025, 21:06:57 GMT

Policy Space Response Oracles (PSRO) is a reinforcement learning (RL) algorithm for two-player zero-sum games that has been empirically shown to find approximate Nash equilibria in large games. Although PSRO is guaranteed to converge to an approximate Nash equilibrium and can handle continuous actions, it may take an exponential number of iterations as the number of information states (infostates) grows. We propose Extensive-Form Double Oracle (XDO), an extensive-form double oracle algorithm for two-player zero-sum games that is guaranteed to converge to an approximate Nash equilibrium linearly in the number of infostates. Unlike PSRO, which mixes best responses at the root of the game, XDO mixes best responses at every infostate. We also introduce Neural XDO (NXDO), where the best response is learned through deep RL. In tabular experiments on Leduc poker, we find that XDO achieves an approximate Nash equilibrium in a number of iterations an order of magnitude smaller than PSRO. Experiments on a modified Leduc poker game and Oshi-Zumo show that tabular XDO achieves a lower exploitability than CFR with the same amount of computation. We also find that NXDO outperforms PSRO and NFSP on a sequential multidimensional continuous-action game. NXDO is the first deep RL method that can find an approximate Nash equilibrium in high-dimensional continuous-action sequential games.

approximate nash equilibrium, double oracle algorithm, name change, (10 more...)

Neural Information Processing Systems

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.59)

Add feedback

A Ablations

Neural Information Processing SystemsAug-15-2025, 23:29:18 GMT

We find that past play greatly stabilizes the emergence of reciprocity in IPD. In cells containing another agent, we include the RUSP observations in these channels. In Figure 11 we show results when training with RUSP in these environments. Consistent with past work, the greedy baseline fails to reach a solution with high collective return. We use a distributed computing infrastructure used in Berner et al.

action head, agent, prisoner, (16 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.49)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.31)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.31)

Add feedback

XDO: A Double Oracle Algorithm for Extensive-Form Games

Neural Information Processing SystemsJan-19-2025, 01:05:13 GMT

approximate nash equilibrium, double oracle algorithm, extensive-form game, (9 more...)

Neural Information Processing Systems

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.62)

Add feedback

XDO: A Double Oracle Algorithm for Extensive-Form Games

McAleer, Stephen, Lanier, John, Baldi, Pierre, Fox, Roy

arXiv.org Artificial IntelligenceMar-10-2021

Policy Space Response Oracles (PSRO) is a deep reinforcement learning algorithm for two-player zero-sum games that has empirically found approximate Nash equilibria in large games. Although PSRO is guaranteed to converge to a Nash equilibrium, it may take an exponential number of iterations as the number of infostates grows. We propose Extensive-Form Double Oracle (XDO), an extensive-form double oracle algorithm that is guaranteed to converge to an approximate Nash equilibrium linearly in the number of infostates. Unlike PSRO, which mixes best responses at the root of the game, XDO mixes best responses at every infostate. We also introduce Neural XDO (NXDO), where the best response is learned through deep RL. In tabular experiments on Leduc poker, we find that XDO achieves an approximate Nash equilibrium in a number of iterations 1-2 orders of magnitude smaller than PSRO. In experiments on a modified Leduc poker game, we show that tabular XDO achieves over 11x lower exploitability than CFR and over 82x lower exploitability than PSRO and XFP in the same amount of time. We also show that NXDO beats PSRO and is competitive with NFSP on a large no-limit poker game.

infostate, iteration, xdo, (15 more...)

arXiv.org Artificial Intelligence

2103.06426

Country:

North America > United States > California > San Diego County > San Diego (0.04)
Europe > Sweden > Stockholm > Stockholm (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (0.46)

Industry: Leisure & Entertainment > Games (1.00)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

A Unified Game-Theoretic Approach to Multi-agent Reinforcement Learning

#artificialintelligenceFeb-13-2019, 14:13:11 GMT

Today we will dig into a paper ripped of A Unified Game-Theoretic Approach to Multi-agent Reinforcement Learning, one of the core ideas that has been used for the development of #AlphaStar . There are several concepts in AlphaStar that won t be treated here . The aim is to dig in the concepts that what has been as the "Nash League" conceptual functioning and how game theory came to mix with reinforcement learning . At the end of this article you should have a notion of Double Oracle algorithm, Deep Cognitive Hierarchies and Policy-Space Response Oracles . For this post you should be familiarized with some concepts about game theory, like the setup of the strategic game in form of the payoff matrix, the understanding of Nash Equilibria and best response.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

#artificialintelligence

Industry: Leisure & Entertainment > Games (0.59)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Security Games for Controlling Contagion

Tsai, Jason (University of Southern California) | Nguyen, Thanh H. (University of Southern California) | Tambe, Milind (University of Southern California)

AAAI ConferencesJul-21-2012

Many strategic actions carry a ‘contagious’ component beyond the immediate locale of the effort itself. Viral marketing and peacekeeping operations have both been observed to have a spreading effect. In this work, we use counterinsurgency as our illustrative domain. Defined as the effort to block the spread of support for an insurgency, such operations lack the manpower to defend the entire population and must focus onthe opinions of a subset of local leaders. As past researchers of security resource allocation have done, we propose using game theory to develop such policies and model the interconnected network of leaders as a graph. Unlike this past work in security games, actions in these domains possess a probabilistic, non-local impact. To address this new class of security games, we combine recent research in influence blocking maximization with a double oracle approach and create novel heuristic oracles to generate mixed strategies for a real-world leadership network from Afghanistan, synthetic leadership networks, and a real social network. We find that leadership networks that exhibit highly interconnected clusters can be solved equally well by our heuristic methods, but our more sophisticated heuristics outperform simpler ones in less interconnected social networks.

artificial intelligence, information management, social media, (20 more...)

AAAI Conferences

Twenty-Sixth AAAI Conference on Artificial Intelligence

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.28)
Asia > Afghanistan (0.25)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > Illinois > Cook County > Chicago (0.04)

Industry:

Information Technology (1.00)
Government > Military (1.00)
Government > Regional Government > North America Government > United States Government (0.94)
Leisure & Entertainment > Games > Computer Games (0.81)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Communications > Social Media (0.71)
Information Technology > Information Management > Search (0.46)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.46)

Add feedback

Multi-Step Multi-Sensor Hider-Seeker Games

Halvorson, Erik Daniel (Duke University) | Conitzer, Vincent (Duke University) | Parr, Ronald (Duke University)

AAAI ConferencesJun-23-2009

We study a multi-step hider-seeker game where the hider is moving on a graph and, in each step, the seeker is able to search c subsets of the graph nodes. We model this game as a zero-sum Bayesian game, which can be solved in weakly polynomial time in the players' action spaces. The seeker's action space is exponential in c, and both players' action spaces are exponential in the game horizon. To manage this intractability, we use a column/constraint generation approach for both players. This approach requires an oracle to determine best responses for each player. However, we show that computing a best response for the seeker is NP-hard, even for a single-step game when c is part of the input, and that computing a best response is NP-hard for both players for the multi-step game, even if c = 1. An integer programming formulation of the best response for the hider is practical for moderate horizons, but computing an exact seeker best response is impractical due to the exponential dependence on both c and the horizon. We therefore develop an approximate best response oracle with bounded suboptimality for the seeker. We prove performance bounds on the strategy that results when column/constraint generation with approximate best responses converges, and we measure the performance of our algorithm in simulations. In our experimental results, column/constraint generation converges to near-minimax strategies for both players fairly quickly.

algorithm, hider, seeker, (15 more...)

AAAI Conferences

Twenty-First International Joint Conference on Artificial Intelligence

Industry: Leisure & Entertainment > Games (1.00)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Machine Learning (0.86)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.35)

Add feedback